Skip to content

unified: use a vendored-in copy of tree-sitter-swift#21819

Open
tausbn wants to merge 2 commits into
mainfrom
tausbn/unified-vendor-in-tree-sitter-swift
Open

unified: use a vendored-in copy of tree-sitter-swift#21819
tausbn wants to merge 2 commits into
mainfrom
tausbn/unified-vendor-in-tree-sitter-swift

Conversation

@tausbn
Copy link
Copy Markdown
Contributor

@tausbn tausbn commented May 8, 2026

For ease of iteration on the prototype.

@tausbn tausbn added the no-change-note-required This PR does not need a change note label May 8, 2026
@tausbn tausbn marked this pull request as ready for review May 8, 2026 15:33
Copilot AI review requested due to automatic review settings May 8, 2026 15:33
@tausbn tausbn requested review from a team as code owners May 8, 2026 15:33
@tausbn tausbn requested a review from asgerf May 8, 2026 15:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR vendors the tree-sitter-swift grammar into unified/extractor to make iterating on the unified extractor’s Swift parsing prototype easier, and updates Rust/Bazel wiring to use the in-repo copy instead of the crates.io package.

Changes:

  • Add a vendored unified/extractor/tree-sitter-swift crate including generated parser sources, scanner, queries, and build scripts.
  • Switch unified/extractor from the registry tree-sitter-swift dependency to a local path dependency and add the new crate to the workspace.
  • Update Bazel third-party wiring to remove the crates.io tree-sitter-swift archive and add required deps (cc, tree-sitter-language) for the new local crate.
Show a summary per file
File Description
unified/extractor/tree-sitter-swift/tree-sitter.json Adds tree-sitter grammar metadata/config for Swift.
unified/extractor/tree-sitter-swift/src/tree_sitter/parser.h Vendors tree-sitter parser API header used by generated sources/scanner.
unified/extractor/tree-sitter-swift/src/tree_sitter/array.h Vendors tree-sitter internal array utilities used by generated sources.
unified/extractor/tree-sitter-swift/src/tree_sitter/alloc.h Vendors tree-sitter allocator abstraction header.
unified/extractor/tree-sitter-swift/src/scanner.c Adds Swift external scanner implementation (comments/raw strings/semi handling/etc.).
unified/extractor/tree-sitter-swift/README.md Vendors upstream README for the grammar.
unified/extractor/tree-sitter-swift/queries/textobjects.scm Adds textobject queries for Swift.
unified/extractor/tree-sitter-swift/queries/tags.scm Adds tags queries for symbol definitions.
unified/extractor/tree-sitter-swift/queries/outline.scm Adds outline queries for structure extraction.
unified/extractor/tree-sitter-swift/queries/locals.scm Adds locals queries (definitions/scopes).
unified/extractor/tree-sitter-swift/queries/injections.scm Adds injection queries (regex/comment injections).
unified/extractor/tree-sitter-swift/queries/indents.scm Adds indentation queries.
unified/extractor/tree-sitter-swift/queries/highlights.scm Adds syntax highlighting queries.
unified/extractor/tree-sitter-swift/queries/folds.scm Adds folding queries.
unified/extractor/tree-sitter-swift/package.json Vendors upstream Node package metadata for the grammar.
unified/extractor/tree-sitter-swift/LICENSE Adds upstream MIT license for the vendored grammar.
unified/extractor/tree-sitter-swift/grammar.js Vendors the Swift grammar definition.
unified/extractor/tree-sitter-swift/Cargo.toml Adds a local Rust crate wrapper for the vendored Swift grammar.
unified/extractor/tree-sitter-swift/BUILD.bazel Adds Bazel rules to build the vendored grammar as a Rust library.
unified/extractor/tree-sitter-swift/bindings/rust/lib.rs Provides LanguageFn and embeds node-types/queries; includes basic tests.
unified/extractor/tree-sitter-swift/bindings/rust/build.rs Builds parser.c + scanner.c via cc during Rust builds.
unified/extractor/tree-sitter-swift/bindings/node/index.js Vendors Node binding loader.
unified/extractor/tree-sitter-swift/bindings/node/binding.cc Vendors Node binding implementation exporting the language.
unified/extractor/tree-sitter-swift/binding.gyp Vendors Node-gyp build configuration for the Node binding.
unified/extractor/Cargo.toml Switches tree-sitter-swift dependency to local path.
unified/extractor/BUILD.bazel Adds the new local tree-sitter-swift Bazel target as a dependency.
MODULE.bazel Adds Bazel module repos for cc and tree-sitter-language; removes crates.io tree-sitter-swift repo.
misc/bazel/3rdparty/tree_sitter_extractors_deps/defs.bzl Removes vendored crates.io tree-sitter-swift archive; adds mappings for the new local crate + its deps.
misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.tree-sitter-swift-0.7.2.bazel Deletes the autogenerated BUILD file for the removed crates.io tree-sitter-swift dependency.
misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.bazel Adds aliases for cc and tree-sitter-language.
Cargo.toml Adds the vendored tree-sitter-swift crate as a workspace member.
Cargo.lock Converts tree-sitter-swift from registry source to a workspace package entry (removes source/checksum).

Copilot's findings

Comments suppressed due to low confidence (1)

unified/extractor/tree-sitter-swift/Cargo.toml:22

  • This crate’s Rust tests and doctest example reference the tree_sitter crate (tree_sitter::Parser), but Cargo.toml doesn’t declare a tree-sitter dependency (and it can’t be used transitively). Add an explicit tree-sitter dependency (or at least a dev-dependency) so cargo test/doctests compile.
# When updating these dependencies, run `misc/bazel/3rdparty/update_cargo_deps.sh`
[dependencies]
tree-sitter-language = "0.1"

[build-dependencies]
cc = "1.2"

  • Files reviewed: 31/35 changed files
  • Comments generated: 1

Comment on lines +705 to +711
#define DIRECTIVE_COUNT 4
const char* DIRECTIVES[OPERATOR_COUNT] = {
"if",
"elseif",
"else",
"endif"
};
@asgerf
Copy link
Copy Markdown
Contributor

asgerf commented May 11, 2026

This adds 600k lines of code, of which 550k comes from the auto-generated parser.cc and 30k from the generated node-types.json.

I think we should go against tree-sitter conventions and avoid checking it in these generated artifacts, and instead rely on Bazel rules to rebuild when needed. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants